Modeling the Synchrony between Audio and Visual Modalities for Speaker Identification

نویسندگان

Yu Wang

Zhiyong Wu

Lianhong Cai

Helen M. Meng

چکیده

This work aims to understand and model the inter-modal temporal relations between the audio and visual modalities of speech and validate whether the captured relations can improve the performance of audio-visual bimodal modeling for such applications as audio-visual speaker identification. We propose to extend our audio-visual correlative model (AVCM) with explicit durational modeling of the partial temporal synchrony between the two speech modalities, i.e. where the audio may lead, lag or remain synchronized with the video. We refer to the new extended model as DurationalAVCM. Experiments on the CMU database and a homegrown database demonstrate that Durational-AVCM can improve the accuracies of audio-visual speaker identification at all levels of acoustic signal-to-noise ratios (SNR) from 0dB to 30dB with varying acoustic conditions compared to original AVCM model. The results indicate the importance of incorporating the partial temporal synchrony between audio and visual modalities for audio-visual bimodal modeling.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Audio-Visual Correlation Modeling for Speaker Identification and Synthesis

This thesis addresses two major problems of multimodal signal processing using audiovisual correlation modeling: speaker recognition and speaker synthesis. We address the first problem, i.e., the audiovisual speaker recognition problem within an open-set identification framework, where audio (speech) and lip texture (intensity) modalities are fused employing a combination of early and late inte...

متن کامل

Detecting audio-visual synchrony using deep neural networks

In this paper, we address the problem of automatically detecting whether the audio and visual speech modalities in frontal pose videos are synchronous or not. This is of interest in a wide range of applications, for example spoof detection in biometrics, lip-syncing, speaker detection and diarization in multi-subject videos, and video data quality assurance. In our adopted approach, we investig...

متن کامل

Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities

An audio-visual speaker identification system is described, where the audio and visual speech modalities are fused by an automatic unsupervised process that adapts to local classifier performance, by taking into account the output score based reliability estimates of both modalities. Previously reported methods do not consider that both the audio and the visual modalities can be degraded. The v...

متن کامل

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

We study the problem of detecting audio-visual synchrony in video segments containing a speaker in frontal head pose. The problem holds a number of important applications, for example speech source localization, speech activity detection, speaker diarization, speech source separation, and biometric spoofing detection. In particular, we build on earlier work, extending our previously proposed ti...

متن کامل

Audio-visual synchronisation for speaker diarisation

The role of audio–visual speech synchrony for speaker diarisation is investigated on the multiparty meeting domain. We measured both mutual information and canonical correlation on different sets of audio and video features. As acoustic features we considered energy and MFCCs. As visual features we experimented both with motion intensity features, computed on the whole image, and Kanade Lucas T...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Modeling the Synchrony between Audio and Visual Modalities for Speaker Identification

نویسندگان

چکیده

منابع مشابه

Audio-Visual Correlation Modeling for Speaker Identification and Synthesis

Detecting audio-visual synchrony using deep neural networks

Audio-Visual Speaker Identification via Adaptive Fusion Using Reliability Estimates of Both Modalities

Robust audio-visual speech synchrony detection by generalized bimodal linear prediction

Audio-visual synchronisation for speaker diarisation

عنوان ژورنال:

اشتراک گذاری